Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Gene splice site identification based on BERT and CNN
Min ZUO, Hong WANG, Wenjing YAN, Qingchuan ZHANG
Journal of Computer Applications    2023, 43 (10): 3309-3314.   DOI: 10.11772/j.issn.1001-9081.2022091447
Abstract288)   HTML13)    PDF (1829KB)(152)       Save

With the development of high-throughput sequencing technology, massive genome sequence data provide a data basis to understand the structure of genome. As an essential part of genomics research, splice site identification plays a vital role in gene discovery and determination of gene structure, and is of great importance for understanding the expression of gene traits. To address the problem that existing models cannot extract high-dimensional features of DNA (DeoxyriboNucleic Acid) sequences sufficiently, a splice site prediction model consisted of BERT (Bidirectional Encoder Representations from Transformers) and parallel Convolutional Neural Network (CNN) was constructed, namely BERT-splice. Firstly, the DNA language model was trained by BERT pre-training method to extract the contextual dynamic association features of DNA sequences and map DNA sequence features with a high-dimensional matrix. Then, the DNA language model was used to map the human reference genome sequence hg19 data into a high-dimensional matrix, and the result was adopted as input of parallel CNN classifier for retraining. Finally, a splice site prediction model was constructed on the basis of the above. Experimental results show that the prediction accuracy of BERT-splice model is 96.55% on the donor set of DNA splice sites and 95.80% on the acceptor set, which improved by 1.55% and 1.72% respectively, compared to that of the BERT and Recurrent Convolutional Neural Network (RCNN) constructed prediction model BERT-RCNN. Meanwhile, the average False Positive Rate (FPR) of donor/acceptor splice sites tested on five complete human gene sequences is 4.74%. The above verifies that the effectiveness of BERT-splice model for gene splice site prediction.

Table and Figures | Reference | Related Articles | Metrics